9 / 20 / 2004 10 : 43 AM 1 Statistical Methods for Recognition Statistical Methods for the Recognition and Understanding of Speech
نویسندگان
چکیده
Statistical methods for speech processing refer to a general methodology in which knowledge about both a speech signal and the language that it expresses, along with practical uses of that knowledge for specific tasks or services, is developed from actual realizations of speech data through a well-defined mathematical and statistical formalism. For more than 20 years, this basic methodology has produced many advances and new results, particularly for recognizing and understanding speech and natural language by machine. In this article, we focus on several important statistical methods, e.g., one based primarily on the hidden Markov model (HMM) formulation that has gained widespread acceptance as the dominant technique and one related to use of statistics for characterizing word co-occurrences. In order to recognize and understand speech, the speech signal is first processed by an acoustic processor, which converts the waveform to a set of spectral feature vectors which characterize the time-varying properties of the speech sounds, and then by a linguistic decoder, which decodes the feature vectors into a word sequence which is valid according to the word lexicon and task grammar associated with the speech recognition or understanding task. The hidden Markov model approach is mainly used for acoustic modeling, that is assigning probabilities to acoustic realizations of a sequence of sounds or words, and a statistical language model is used to assign probabilities to sequences of words in the language. A Bayesian approach is used to find the word sequence with the maximum a posteriori probability over all possible sentences in the task language. This search problem is often astronomically large for large vocabulary speech understanding problems, and thus the speech-to-text decoding process often requires inordinate amounts of computing power to solve by heuristic methods. Fortunately, using results from the field of Finite State Automata Theory, we can reduce the computational burden of the search by orders of magnitude, thereby enabling exact solutions in computationally feasible times for large speech recognition problems. 1 This article is based on a series of lectures on Challenges in Speech Recognition by one of the authors (LRR) and his many colleagues at AT&T Labs Research, most especially Dr. Mazin Rahim who contributed to the presentation and figures used throughout this article. We thank Dr. Rahim for his help and support.
منابع مشابه
Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملStatistical Variation Analysis of Formant and Pitch Frequencies in Anger and Happiness Emotional Sentences in Farsi Language
Setup of an emotion recognition or emotional speech recognition system is directly related to how emotion changes the speech features. In this research, the influence of emotion on the anger and happiness was evaluated and the results were compared with the neutral speech. So the pitch frequency and the first three formant frequencies were used. The experimental results showed that there are lo...
متن کاملAssociation Between Facial Affect Recognition and Maladaptive Schema in People With Depressive Symptoms
Objective: Depressed patients harbor maladaptive schemas that distort social reality and impaired facial expression recognition. Thus, this study aims at identifying specific associations among depressive symptoms, early maladaptive schemas, and patterns of for recognizing facially expressed emotions. Methods: 100 subjects diagnosed with depressive symptoms were selected from a larger statisti...
متن کاملAdvances in Artificial Intelligence Using Speech Recognition
Abstract—This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily ro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004